27 research outputs found
Geometry of Resource Interaction - A Minimalist Approach
International audienceThe Resource λ-calculus is a variation of the λ-calculus where arguments can be superposed and must be linearly used. Hence it is a model for linear and non-deterministic and programming languages, and the target language of Ehrhard-Taylor expansion of λ-terms. In a strictly typed restriction of the Resource λ-calculus, we study the notion of path persistence, and we define a Geometry of Interaction that characterises it. The construction is also invariant under reduction and able to count addends in normal forms
Is the Optimal Implementation Inefficient? Elementarily Not
Sharing graphs are a local and asynchronous implementation of lambda-calculus beta-reduction (or linear logic proof-net cut-elimination) that avoids useless duplications. Empirical benchmarks suggest that they are one of the most efficient machineries, when one wants to fully exploit the higher-order features of lambda-calculus. However, we still lack confirming grounds with theoretical solidity to dispel uncertainties about the adoption of sharing graphs.
Aiming at analysing in detail the worst-case overhead cost of sharing operators, we restrict to the case of elementary and light linear logic, two subsystems with bounded computational complexity of multiplicative exponential linear logic. In these two cases, the bookkeeping component is unnecessary, and sharing graphs are simplified to the so-called "abstract algorithm". By a modular cost comparison over a syntactical simulation, we prove that the overhead of shared reductions is quadratically bounded to cost of the naive implementation, i.e. proof-net reduction. This result generalises and strengthens a previous complexity result, and implies that the price of sharing is negligible, if compared to the obtainable benefits on reductions requiring a large amount of duplication
API Comparison of CPU-To-GPU Command Offloading Latency on Embedded Platforms (Artifact)
High-performance heterogeneous embedded platforms allow offloading of parallel workloads to an integrated accelerator, such as General Purpose-Graphic Processing Units (GP-GPUs). A time-predictable characterization of task submission is a must in real-time applications. We provide a profiler of the time spent by the CPU for submitting stereotypical GP-GPU workload shaped as a Deep Neural Network of parameterized complexity. The submission is performed using the latest API available: NVIDIA CUDA, including its various techniques, and Vulkan. Complete automation for the test on Jetson Xavier is also provided by scripts that install software dependencies, run the experiments, and collect results in a PDF report
Novel Methodologies for Predictable CPU-To-GPU Command Offloading
There is an increasing industrial and academic interest towards a more predictable characterization of real-time tasks on high-performance heterogeneous embedded platforms, where a host system offloads parallel workloads to an integrated accelerator, such as General Purpose-Graphic Processing Units (GP-GPUs). In this paper, we analyze an important aspect that has not yet been considered in the real-time literature, and that may significantly affect real-time performance if not properly treated, i.e., the time spent by the CPU for submitting GP-GPU operations. We will show that the impact of CPU-to-GPU kernel submissions may be indeed relevant for typical real-time workloads, and that it should be properly factored in when deriving an integrated schedulability analysis for the considered platforms.
This is the case when an application is composed of many small and consecutive GPU compute/copy operations. While existing techniques mitigate this issue by batching kernel calls into a reduced number of persistent kernel invocations, in this work we present and evaluate three other approaches that are made possible by recently released versions of the NVIDIA CUDA GP-GPU API, and by Vulkan, a novel open standard GPU API that allows an improved control of GPU command submissions. We will show that this added control may significantly improve the application performance and predictability due to a substantial reduction in CPU-to-GPU driver interactions, making Vulkan an interesting candidate for becoming the state-of-the-art API for heterogeneous Real-Time systems.
Our findings are evaluated on a latest generation NVIDIA Jetson AGX Xavier embedded board, executing typical workloads involving Deep Neural Networks of parameterized complexity
Design and operation of the air-cooled beam dump for the extraction line of CERN's Proton Synchrotron Booster (PSB)
A new beam dump has been designed, built, installed and operated to withstand
the future proton beam extracted from the Proton Synchrotron Booster (PSB) in
the framework of the LHC Injector Upgrade (LIU) Project at CERN, consisting of
up to 1E14 protons per pulse at 2 GeV, foreseen after the machine upgrades
planned for CERN's Long Shutdown 2 (2019-2020). In order to be able to
efficiently dissipate the heat deposited by the primary beam, the new dump was
designed as a cylindrical block assembly, made out of a copper alloy and cooled
by forced airflow. In order to determine the energy density distribution
deposited by the beam in the dump, Monte Carlo simulations were performed using
the FLUKA code, and thermo-mechanical analyses were carried out by importing
the energy density into ANSYS. In addition, Computational Fluid Dynamics (CFD)
simulations of the airflow were performed in order to accurately estimate the
heat transfer convection coefficient on the surface of the dump. This paper
describes the design process, highlights the constraints and challenges of
integrating a new dump for increased beam power into the existing facility and
provides data on the operation of the dump
The Key Role of Memory in Next-Generation Embedded Systems for Military Applications
With the increasing use of multi-core platforms in safety-related domains, aircraft system integrators and authorities exhibit a concern about the impact of concurrent access to shared-resources in the Worst-Case Execution Time (WCET). This paper highlights the need for accurate memory-centric scheduling mechanisms for guaranteeing prioritized memory accesses to Real-Time safety-related components of the system. We implemented a software technique called cache coloring that demonstrates that isolation at timing and spatial level can be achieved by managing the lines that can be evicted in the cache. In order to show the effectiveness of this technique, the timing properties of a real application are considered as a use case, this application is made of parallel tasks that show different trade-offs between computation and memory loads
FIB-SEM investigation and uniaxial compression of flexible graphite
Flexible graphite (FG) with 1 - 1.2 g/cm density is employed as beam
energy absorber material in the CERN's Large Hadron Collider (LHC) beam dumping
system. However, the increase of energy deposited expected for new HL-LHC
(High-Luminosity LHC) design demanded for an improvement in reliability and
safety of beam dumping devices, and the need for a calibrated material model
suitable for high-level FE simulations has been prioritized. This work sets the
basic knowledge to develop a material model for FG suitable to this aim. A
review of the FG properties available in literature is first given, followed by
FIB-SEM (Focused Ion Beam - Scanning Electron Microscopy) microstructure
investigation and monotonic and cyclic uniaxial compression tests. Similarities
with other well-known groups of materials such as crushable foams, crumpled
materials and compacted powders have been discussed. A simple 1D
phenomenological model has been used to fit the experimental stress-strain
curves and the accuracy of the result supports the assumptions that the
graphite-like microstructure and the crumpled meso-structure play the major
role under out-of-plane uniaxial compression.Comment: Pre-print template, 57 pages, 14 figure
OttimalitĂ dell'ottimalitĂ - ComplessitĂ della riduzione su grafi di condivisione
Trentâanni or sono il concetto di ottimalitĂ venne formulato in senso teorico da LĂ©vy, ma solo un decennio dopo Lamping riesce a darne elegante implementazione algoritmica. Realizza un sistema di riduzione su graïŹ che si scoprirĂ poi avere interessanti analogie con la logica lineare presentata nello stesso periodo da Girard.
Ma lâottimalitĂ Ăš davvero ottimale? In altre parole, lâimplementazione ottimale del λ calcolo realizzata attraverso i graïŹ di condivisione, Ăš davvero la migliore strategia di riduzione, in termini di complessitĂ ?
Dopo anni di infondati dubbi e di immeritato oblĂŹo, alla conferenza LICS del 2007, Baillot, Coppola e Dal Lago, danno una prima risposta positiva, seppur parziale. Considerano infatti il caso particolare delle logiche aïŹni elementare e leggera, che possiedono interessanti proprietĂ a livello di complessitĂ intrinseca e semplificano lâarduo problema.
La prima parte di questa tesi presenta, in sintesi, la teoria dellâottimalitĂ e la sua implementazione condivisa.
La seconda parte aïŹronta il tema della sua complessitĂ , a cominciare da una panoramica dei piĂč importanti risultati ad essa legati. La successiva introduzione alle logiche affini, e alle relative caratteristiche, costituisce la necessaria premessa ai due capitoli successivi, che presentano una dimostrazione alternativa ed originale degli ultimi risultati basati appunto su EAL e LAL. Nel primo dei due capitoli viene definito un sistema intermedio fra le reti di prova delle logiche e la riduzione dei grafi, nel secondo sono dimostrate correttezza ed ottimalitĂ dellâimplementazione condivisa per mezzo di una simulazione.
Lungo la trattazione sono offerti alcuni spunti di riflessione sulla dinamica interna della ÎČ riduzione riduzione e sui suoi legami con le reti di prova della logica lineare
Condivisione, sovrapposizione e sviluppo: Studi geometrici sulla semantica e l'implementazione di lambda-calcoli e reti di dimostrazioni
Elegant semantics and efficient implementations of functional programming languages can both be described by the very same mathematical structures, most prominently within the Curry-Howard correspondence, where programs, types and execution respectively coincide with proofs, formulĂŠ and normalisation. Such a flexibility is sharpened by the deconstructive and geometrical approach pioneered by linear logic (LL) and proof-nets, and by LĂ©vy-optimal reduction and sharing graphs (SG).Adapting Girardâs geometry of interaction, this thesis introduces the geometry of resource interaction (GoRI), a dynamic and denotational semantics, which describes, algebraically by their paths, terms of the resource calculus (RC), a linear and non-deterministic variation of the ordinary lambda calculus. Infinite series of RC-terms are also the domain of the Taylor-Ehrhard-Regnier expansion, a linearisation of LC. The thesis explains the relation between the former and the reduction by proving that they commute, and provides an expanded version of the execution formula to compute paths for the typed LC.SG are an abstract implementation of LC and proof-nets whose steps are local and asynchronous, and sharing involves both terms and contexts. Whilst experimental tests on SG show outstanding speedups, up to exponential, with respect to traditional implementations, sharing comes at price. The thesis proves that, in the restricted case of elementary proof-nets, where only the core of SG is needed, such a price is at most quadratic, hence harmless.Des sĂ©mantiques Ă©lĂ©gantes et des implĂ©mentations efficaces des langages de programmation fonctionnels peuvent ĂȘtre dĂ©crits par les mĂȘmes structures mathĂ©matiques, notamment dans la correspondance Curry-Howard, oĂč le programmes, les types et lâexĂ©cution, coĂŻncident aux preuves, formules et normalisation. Une telle flexibilitĂ© est aiguisĂ© par lâapproche deconstructif et gĂ©omĂ©trique de la logique lineaire (LL) et les rĂ©seaux de preuve, et de la rĂ©duction optimale et les graphes de partage (SG).En adaptent la gĂ©omĂ©trie de lâinterac-tion de Girard, cette thĂšse propose une gĂ©omĂ©trie de lâinteraction des ressources (GoRI), une sĂ©mantique dynamique et dĂ©notationelle, qui dĂ©crit algĂ©briquement par leur chemins, les termes du calcul des ressources (RC), une variation linĂ©aire et non-dĂ©terministe du lambda calcul (LC). Les sĂ©ries infinis dans RC sont aussi le domaine du dĂ©veloppement de Taylor-Ehrhard-Regnier, une linĂ©arisation du LC. La thĂšse explique la relation entre ce dernier et la rĂ©duction dĂ©montrant quâils commutent, et prĂ©sente une version dĂ©veloppĂ© de la formule dâexĂ©cution pour calculer les chemins du LC typĂ©.Les SG sont un modĂšle dâimplĂ©men-tation du LC, dont les pas sont locales et asynchrones, et le partage implique et les termes et les contextes. Bien que les tests ont montrĂ© des accĂ©lĂ©rations exceptionnelles, jusqu Ă exponentielles, par rapport aux implĂ©mentations traditionnelles, les SG nâont pas que des avantages. La thĂšse montre que, dans le cas restreint des reseaux Ă©lĂ©mentaires, oĂč seule le cĆur des SG est requis, les dĂ©savantages sont au plus quadratique, donc inoffensifs.Semantiche eleganti ed implemen-tazioni efficienti di linguaggi di programmazione funzionale possono entrambe essere descritte dalle stesse strutture matematiche, piĂč notevolmente nella corrispondenza Curry-Howard, dove i programmi, i tipi e lâesecuzione coincidono, nellâordine, con le dimostrazioni, le formule e la normalizzazione. Tale flessibilitĂ Ăš acuita dallâapproccio decostruttivo e geometrico della logica lineare (LL) e le reti di dimostrazione, e della riduzione ottimale e i grafi di condivisione (SG).Adattando la geometria dellâinterazione di Girard, questa tesi introduce la geometria dellâinterazione delle risorse (GoRI), una semantica dinamica e denotazionale che descrive, algebricamente tramite i loro percorsi, i termini del calcolo delle risorse (RC), una variante lineare e non-deterministica del lambda calcolo ordinario. Le serie infinite di termini del RC sono inoltre il dominio dellâespansione di Taylor-Ehrhard-Regnier, una linearizzazione del LC. La tesi spiega la relazione tra questâultima e la riduzione dimostrando che esse commutano, e fornisce una versione espansa della formula di esecuzione per calcolare i percorsi del LC tipato.I SG sono un modello dâimplementazione del LC, i cui passi sono locali e asincroni, e la cui condivisione riguarda sia termini che contesti. Sebbene le prove sperimentali sui SG mostrino accellerazioni eccezionali, persino esponenziali, rispetto alle implementazioni tradizionali, la condivisione ha un costo. La tesi dimostra che, nel caso ristretto delle reti elementari, dove Ăš necessario solo il cuore dei SG, tale costo Ăš al piĂč quadratico, e quindi innocuo